Detecting Web Spam Based on Novel Features from Web Page Source Code
نویسندگان
چکیده
منابع مشابه
Detecting Cloaking Web Spam Using Hash Function
Web spam is an attempt to boost the ranking of special pages in search engine results. Cloaking is a kind of spamming technique. Previous cloaking detection methods based on terms/links differences between crawler and browser’s copies are not accurate enough. The latest technique is tag-based method. This method could find cloaked pages better than previous algorithms. However, addressing the c...
متن کاملContent Trust Model for Detecting Web Spam
As it gets easier to add information to the web via html pages, wikis, blogs, and other documents, it gets tougher to distinguish accurate or trustworthy information from inaccurate or untrustworthy information. Moreover, apart from inaccurate or untrustworthy information, we also need to anticipate web spam – where spammers publish false facts and scams to deliberately mislead users. Creating ...
متن کاملSemantic Web-based Source Code Search
The ability to search for source code on the Internet has proven to be essential for many common software development and maintenance tasks. However, available code search engines are typically limited to lexical searches and do not take in consideration the underlying semantics of source code such as the program structure or language. Especially object-oriented source code, which includes inhe...
متن کاملDetecting Spam Content in Web Corpora
To increase the search result rank of a website, many fake websites full of generated or semigenerated texts have been made in last years. Since we do not want this garbage in our text corpora, this is a becoming problem. This paper describes generated texts observed in the recently crawled web corpora and proposes a new way to detect such unwanted contents. The main idea of the presented appro...
متن کاملA Novel Approach for Web Page Classification using Optimum features
The boom in the use of Web and its exponential growth are now well known. The amount of textual data available on the Web is estimated to be in the order of one terra byte, in addition to images, audio and video. This has imposed additional challenges to the Web directories which help the user to search the Web by classifying selected Web documents into subject. Manual classification of web pag...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Security and Communication Networks
سال: 2020
ISSN: 1939-0122,1939-0114
DOI: 10.1155/2020/6662166